> ## Documentation Index
> Fetch the complete documentation index at: https://mintlify.com/Anny26022/chartsmaze_clone/llms.txt
> Use this file to discover all available pages before exploring further.

# Phase 1: Core Data

> Foundation layer that fetches master stock list and fundamental data

## Overview

Phase 1 establishes the foundation of the EDL Pipeline by fetching the complete market dataset and fundamental metrics. This phase produces the critical `master_isin_map.json` file that all subsequent phases depend on.

<Warning>
  If `fetch_dhan_data.py` fails, the entire pipeline stops. This script produces `master_isin_map.json` which ALL other scripts require.
</Warning>

## Execution Order

Phase 1 runs these scripts sequentially:

<Steps>
  <Step title="Fetch Master Stock List">
    **Script:** `fetch_dhan_data.py`

    Fetches all NSE equity stocks in a single API call.
  </Step>

  <Step title="Fetch Fundamental Data">
    **Script:** `fetch_fundamental_data.py`

    Iterates through each ISIN to fetch quarterly results and financial ratios.
  </Step>

  <Step title="Download Listing Dates">
    **Helper:** `curl` command

    Downloads NSE equity listing dates CSV for enrichment.
  </Step>
</Steps>

***

## Script 1: fetch\_dhan\_data.py

### Purpose

Fetches the complete list of NSE equity stocks (\~2,775 symbols) with basic metrics and creates the master ISIN mapping file.

### API Endpoint

```bash theme={null}
POST https://ow-scanx-analytics.dhan.co/customscan/fetchdt
```

### Request Payload

<CodeGroup>
  ```json Request theme={null}
  {
    "data": {
      "sort": "Mcap",
      "sorder": "desc",
      "count": 5000,
      "fields": [
        "Isin", "DispSym", "Mcap", "Pe", "DivYeild", "Revenue",
        "Year1RevenueGrowth", "NetProfitMargin", "YoYLastQtrlyProfitGrowth",
        "EBIDTAMargin", "volume", "PricePerchng1year", "Sym", "Sid", "FnoFlag"
      ],
      "params": [
        {"field": "OgInst", "op": "", "val": "ES"},
        {"field": "Exch", "op": "", "val": "NSE"}
      ],
      "pgno": 0
    }
  }
  ```

  ```python fetch_dhan_data.py (excerpt) theme={null}
  import requests
  import json
  from pipeline_utils import get_headers

  url = "https://ow-scanx-analytics.dhan.co/customscan/fetchdt"
  payload = {
      "data": {
          "sort": "Mcap",
          "sorder": "desc",
          "count": 5000,
          "params": [
              {"field": "OgInst", "op": "", "val": "ES"},
              {"field": "Exch", "op": "", "val": "NSE"}
          ]
      }
  }

  response = requests.post(url, json=payload, headers=get_headers(include_origin=True))
  ```
</CodeGroup>

### Output Files

| File                      | Description                               | Size     | Records |
| ------------------------- | ----------------------------------------- | -------- | ------- |
| `dhan_data_response.json` | Full API response with all stock data     | \~3 MB   | 2,775   |
| `master_isin_map.json`    | **Critical:** Symbol ↔ ISIN ↔ Sid mapping | \~500 KB | 2,775   |

### master\_isin\_map.json Structure

```json theme={null}
[
  {
    "Symbol": "RELIANCE",
    "ISIN": "INE002A01018",
    "Name": "Reliance Industries Ltd.",
    "Sid": "2885",
    "FnoFlag": 1
  }
]
```

### Dependencies

* **Requires:** Internet connection, valid API headers
* **Depends on:** None (foundation script)

### Typical Execution Time

<Note>
  **\~5-10 seconds** — Single API call fetching 2,775 stocks
</Note>

***

## Script 2: fetch\_fundamental\_data.py

### Purpose

Fetches quarterly results, financial ratios, and TTM metrics for each stock using the ISIN list from Phase 1.

### API Endpoint

```bash theme={null}
POST https://open-web-scanx.dhan.co/scanx/fundamental
```

### Request Payload

<CodeGroup>
  ```json Per-Stock Request theme={null}
  {
    "data": {
      "isin": "INE002A01018"
    }
  }
  ```

  ```python Iteration Logic theme={null}
  import json
  from concurrent.futures import ThreadPoolExecutor

  with open("master_isin_map.json", "r") as f:
      master_map = json.load(f)

  def fetch_fundamental(stock):
      payload = {"data": {"isin": stock["ISIN"]}}
      response = requests.post(url, json=payload, timeout=30)
      return response.json()

  with ThreadPoolExecutor(max_workers=20) as executor:
      results = list(executor.map(fetch_fundamental, master_map))
  ```
</CodeGroup>

### Data Fetched

* **Quarterly Results:** Latest 4 quarters + YoY comparison
* **Income Statement:** Revenue, Net Profit, OPM, EPS
* **Balance Sheet:** Total Assets, Liabilities, Equity
* **Ratios:** ROE, ROCE, Debt/Equity, P/E, P/B
* **TTM Metrics:** Trailing twelve month calculations

### Output Files

| File                    | Description                  | Size    | Records |
| ----------------------- | ---------------------------- | ------- | ------- |
| `fundamental_data.json` | Complete fundamental dataset | \~35 MB | 2,775   |

### Output Structure

```json theme={null}
{
  "Symbol": "RELIANCE",
  "ISIN": "INE002A01018",
  "incomeStat_cq": {
    "Net_Profit": "17594|16446|15138|17955|12273",
    "Eps": "26.10|24.40|22.46|26.64|18.21"
  },
  "CV": {
    "PE": "28.5",
    "ROE": "8.2",
    "ROCE": "9.1"
  },
  "roce_roe": {...},
  "TTM_cy": {...}
}
```

### Dependencies

* **Requires:** `master_isin_map.json` (from fetch\_dhan\_data.py)
* **Timeout:** 30 seconds per request
* **Threading:** 20 concurrent workers

### Typical Execution Time

<Note>
  **\~2-3 minutes** — Fetching 2,775 stocks with 20 threads
</Note>

***

## NSE Listing Dates Download

### Purpose

Downloads the official NSE equity listing dates CSV for enrichment in Phase 3.

### Command

```bash theme={null}
curl -s -o nse_equity_list.csv \
  "https://nsearchives.nseindia.com/content/equities/EQUITY_L.csv" \
  --http1.1 \
  --header "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
```

### Output Files

| File                  | Description                   | Format |
| --------------------- | ----------------------------- | ------ |
| `nse_equity_list.csv` | Symbol → Listing Date mapping | CSV    |

### CSV Structure

```csv theme={null}
SYMBOL, NAME OF COMPANY, DATE OF LISTING
RELIANCE,Reliance Industries Limited,29-NOV-1977
TCS,Tata Consultancy Services Limited,25-AUG-2004
```

<Note>
  This is a non-critical download. Pipeline continues even if this fails.
</Note>

***

## Phase 1 Output Summary

### Files Produced

```plaintext theme={null}
📦 Phase 1 Outputs:
├─ dhan_data_response.json        (~3 MB)
├─ master_isin_map.json           (~500 KB) ⚠️ CRITICAL
├─ fundamental_data.json          (~35 MB)
└─ nse_equity_list.csv            (~200 KB)
```

### Critical Dependencies for Phase 2+

<Warning>
  All Phase 2 scripts require `master_isin_map.json` to iterate through stocks:

  * `fetch_company_filings.py`
  * `fetch_new_announcements.py`
  * `fetch_advanced_indicators.py`
  * `fetch_market_news.py`
  * ... and 6 more scripts
</Warning>

***

## Error Handling

### Critical Failure: fetch\_dhan\_data.py

If this script fails, the pipeline stops immediately:

```python theme={null}
results["fetch_dhan_data.py"] = run_script("fetch_dhan_data.py", "Phase 1")

if not results["fetch_dhan_data.py"]:
    print("🛑 CRITICAL: fetch_dhan_data.py failed. Cannot continue.")
    print("   This script produces master_isin_map.json which ALL other scripts need.")
    return
```

### Non-Critical: Other Failures

* **fetch\_fundamental\_data.py fails:** Pipeline continues, but fundamental fields will be empty
* **NSE CSV download fails:** Pipeline continues, listing dates will be missing

***

## Performance Metrics

### Total Phase 1 Time

<Note>
  **\~2-4 minutes** for 2,775 stocks (including NSE CSV download)
</Note>

### Bottlenecks

* **fetch\_fundamental\_data.py:** API rate limits (mitigated with 20 threads)
* **Network latency:** Depends on connection speed

### Optimization Tips

1. **Increase threading** (if API allows):
   ```python theme={null}
   ThreadPoolExecutor(max_workers=30)  # Up from 20
   ```

2. **Cache master\_isin\_map.json** between runs:
   ```python theme={null}
   if os.path.exists("master_isin_map.json"):
       print("Using cached master map...")
   ```

3. **Skip fundamental refetch** for unchanged stocks (requires change detection)

***

## Next Phase

Once Phase 1 completes, the pipeline automatically proceeds to:

<Card title="Phase 2: Data Enrichment" icon="satellite-dish" href="/pipeline/phase-2-enrichment">
  Fetches company filings, announcements, indicators, news, and surveillance data using the master ISIN map.
</Card>
